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Abstract 

In this paper, we present a generic framework to extend existing uniformly optimal convex 
programming algorithms to solve more general nonlinear, possibly nonconvex, optimization prob¬ 
lems. The basic idea is to incorporate a local search step (gradient descent or Quasi-Newton 
iteration) into these uniformly optimal convex programming methods, and then enforce a mono¬ 
tone decreasing property of the function values computed along the trajectory. Algorithms of 
these types will then achieve the best known complexity for nonconvex problems, and the optimal 
complexity for convex ones without requiring any problem parameters. As a consequence, we 
can have a unified treatment for a general class of nonlinear programming problems regardless 
of their convexity and smoothness level. In particular, we show that the accelerated gradient 
and level methods, both originally designed for solving convex optimization problems only, can 
be used for solving both convex and nonconvex problems uniformly. In a similar vein, we show 
that some well-studied techniques for nonlinear programming, e.g., Quasi-Newton iteration, can 
be embedded into optimal convex optimization algorithms to possibly further enhance their nu¬ 
merical performance. Our theoretical and algorithmic developments are complemented by some 
promising numerical results obtained for solving a few important nonconvex and nonlinear data 
analysis problems in the literature. 

keywords nonconvex optimization, uniformly optimal methods, parameter free methods, gradient 
descent methods, Quasi-Newton methods, accelerated gradient methods, accelerated level methods 
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1 Introduction 

In this paper, we consider the following nonlinear programming problem 

:= min{'I'(x) := f(x) + X(x)}, (1.1) 

x€X 


‘August, 2015. This research was partially supported by NSF grants CMMI-1254446, CMMI-1537414, DMS- 
1319050, DMS-1016204 and ONR grant N00014-13-1-0036. 

1 sghadimi@ufl.edu, Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611. 

*glan@ise.ufl.edu, http://www.ise.ufl.edu/glan, Department of Industrial and Systems Engineering, University of 
Florida, Gainesville, FL 32611. 

§hozhang@math.lsu.edu, https://www.math.lsu.edu/~hozhang, Department of Mathematics, Louisiana State Uni¬ 
versity, Baton Rouge, LA 70803. 


1 



where X C M n is a closed convex set, / : X —> M is possibly nonconvex, and X is a simple but 
possibly non-differentiable convex function with known structure (e.g. X(x) = ||x||i or X even 
vanishes). Moreover, we assume that / has Holder continuous gradient on X i.e., there exists H > 0 
and v £ [0,1] such that 

Wf'iy) ~~ f'{ x )\\ < H\\y — x\\ v Vx,yeX, (1.2) 

where fix) is the gradient of / at x and || • || is the Euclidean norm in M n . The above assumption in 
(11.21) covers a wide range class of objective funtions, including smooth functions (with v = 1), weakly 
smooth functions (with v £ (0,1)), and nonsmooth convex functions with bounded subgradients (with 
v = 0). 

The complexity of solving m has been well-understood under the convex setting, i.e., when / 
is convex. According to the classic complexity theory by Nemirovski and Yudin [22], if / is a general 
convex function with bounded subgradients (i.e., v = 0), then the number of subgradient evaluations 
of / required to find a solution x £ X such that T(x) — \k* < e cannot be smaller than 0(l/e 2 ) 
when n is sufficiently large. Here T* denotes the optimal value of (11.11) . Such a lower complexity 
bound can be achieved by different first-order methods, including the subgradient and mirror descent 
methods in [22] . and the bundle-level type methods in [18, 3j. Moreover, if / is a smooth convex 
function with Lipschitz continuous gradients (i.e., v = 1), then the number of gradient evaluations 
of / cannot be smaller than 0{\/y/e) for sufficiently large n. Such a lower complexity bound can 
be achieved by the well-known Nesterov’s accelerated gradient (AG) method, which was originally 
developed in [23] for the smooth case with X = 0, and recently extended for an emerging class of 
composite problems with a relatively simple nonsmooth term X [25, [2] 130] . 

While traditionally different classes of convex optimization problems were solved by using different 

algorithms, the last few years have seen a growing interest in the development of unified algorithms 

that can achieve the optimal complexity for solving different classes of convex problems, preferably 

without requiring the input of any problem parameters. Lan showed in p3] that Nesterov’s AG 

method in [231 [2l j can achieve the optimal complexity for solving not only smooth, but also general 

nonsmooth optimization problems (i.e., v = 0 in ([1.21) 1 by introducing a novel stepsize policy and 

some new convergence analysis techniques. Devolder, Glineur and Nesterov [29] further generalized 
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this development and showed that the AG method can exhibit the optimal 0(l/e 1+3v ) complexity 
for solving weakly smooth problems. These methods in [lil [29] still require the input of problem 
parameters like v and H, and even the iteration limit N. In a different research line, Lan [15] 
generalized the bundle-level type methods, originally designed for nonsmooth problems, for both 
smooth and weakly smooth problems. He showed that these accelerated level methods are uniformly 
optimal for convex optimization in the sense that they can achieve the optimal complexity bounds 
without requiring the input of any problem parameters and the iteration limit. Simplified variants 
of these methods were also proposed in [6] for solving ball-constrained and unconstrained convex 
optimization problems. Recently, Nesterov [26] presented another uniformly optimal method, namely, 
the universal accelerated gradient method for nonsmooth and (weakly) smooth convex programming. 
This method only needs the target accuracy as an input, which, similar to those in mm, completely 
removes the need of inputting problem parameters in mm- 

In general, all the above-mentioned methods require the convexity on the objective function to 
establish their convergence results. When / is possibly nonconvex, a different termination criterion 
according to the projected gradient g xk {see e.g., (I2.7D ) is often employed to analyze the complexity 
of the solution methods. While there is no known lower iteration complexity bound for first-order 


2 



methods to solve the problem <tm the (projected) gradient-type methods |24l 5j [ 12) achieve the 
best-known iteration complexity 0( 1/e) to find a solution such that ||g x J| 2 < e when / in (11.11) has 
Lipschitz continuous gradient. Recently, Ghadimi and Lan m generalized Nesterov’s AG method 
to solve this class of nonconvex nonlinear optimization problems. They showed that this generalized 
AG method can not only achieve the best-known 0( 1/e) iteration complexity for finding approximate 
stationary points for nonconvex problems, but also exhibit the optimal iteration complexity if the 
objective function turns out to be convex. However, in oder to apply this method, we need to assume 
that all the generated iterates lie in a bounded set and that the gradients of / are be Lipschitz 
continuous, and also requires the input of a few problem parameters a priori. Our main goal of this 
paper is to understand whether we can generalize some of the aforementioned uniformly optimal 
methods to solve a broader class of nonlinear programming given in (11.11) . where function / could 
be nonconvex and only weakly smooth or nonsmooth (see (11.21) 1. In addition to these theoretical 
aspects, our study has also been motivated by the following applications. 

• In many machine learning problems, the regularized loss function in the objective is given as 
a summation of convex and nonconvex terms (see e.g., mmm )• A unified approach may 
help us in exploiting the possible local convex property of the objective function in this class 
of problems, while globally these problems are not convex. 

• Some optimization problems are given through a black-box oracle (see e.g., BID HU). Hence, 
both the smoothness level of the objective function and its convex property are unknown. A 
unified algorithm for both convex and nonconvex optimization and for handling the smoothness 
level in the objective function automatically could achieve better convergence performance for 
this class of problems. 

Our contribution in this paper mainly consists of the following aspects. First, we generalize Nes¬ 
terov’s AG method and present a unified accelerated gradient (UAG) method for solving a subclass 
of problem (11.11) . where / has Lipschitz continuous gradients on X i.e., there exists L > 0 such that 

\\f\y) ~ f'(x)\\ < L\\y - x\\ for any x,y e X. (1.3) 


Note that the above relation is a special case of (11.21) with v = 1 and H replaced by L. The basic 
idea of this method is to combine a gradient descent step with Nesterov’s AG method and maintain 
the monotonicity of the objective function value at the iterates generated by the algorithm. Hence, 
the UAG method contains the gradient projection and Nesterov’s AG methods as special cases (see 
the discussions after presenting Algorithm [TJ) . We show that this UAG method is uniformly optimal 
for the above-mentioned class of nonlinear programming in the sense that it achieves the best known 
iteration complexity (0( 1/e)) to find at least one k such that ||g xi .|| 2 < e, and exhibits the optimal 
complexity (0( 1/y/e)) to find a solution x £ X such that 'F(x) — T* < e, if / turns out to be convex. 
While these results had been also established in 03 ] for the generalization of Nestrov’s AG method, 
the UAG method does not require the boundedness assumption on the generated trajectory. 

Second, we generalize the UAG method for solving a broader class of problems with v £ [0,1] 
in m, and present a unified problem-parameter free accelerated gradient (UPFAG) method. We 
show that this method under the convexity assumption on /, similar to those in (29., HU [26], achieves 
the optimal complexity bound 

(1-4) 
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it also exhibits the best-known iteration complexity 


O 


H" \ 
l+y I 

£2 v ) 


(1.5) 


to reduce the squared norm of the projected gradient within e-accuracy for nonconvex optimization. 
To the best of our knowledge, this is the first time that a uniformly optimal algorithm, which 
does not require any problem parameter information but only takes the target accuracy and a few 
user-defined line search parameters as the input, has been presented for solving smooth, nonsmooth, 
weakly smooth convex and nonconvex optimization. Moreover, this algorithm can also exploit a more 
efficient Quasi-Newton step rather than the gradient descent step for achieving the same iteration 
complexity bounds. 

Third, by incorporating a gradient descent step into the framework of the bundle-level type 
methods, namely, the accelerated prox-level (APL) method presented in [15], we propose a unified 
APL (UAPL) method for solving a class of nonlinear programming defined in (11.11) . where / satisfies 
m- We show that this method achieves the complexity bounds in (11.41) and (11.51) for both convex 
and nonconvex optimization implying that it is uniformly optimal for solving the aforementioned 
class of nonlinear programming. Moreover, we simplify this method and present its fast variant, by 
incorporating a gradient descent step into the framework of the fast APL method [6|, for solving 
ball-constrained and unconstrained problems. To the best of our knowledge, this is the first time that 
the bundle-level type methods are generalized for these nonconvex nonlinear programming problems. 

The rest of the paper is organized as follows. In Section [2] we present the UAG method for 
solving a class of nonlinear programming problems where the objective function is the summation 
of a Lipschitz continuously differentiable function and a simple convex function, and establish the 
convergence results. We then generalize this method in Section [3] for solving a broader class of 
problems where the Lipschitz continuously differentiable function in the objective is replaced by a 
weakly smooth function with Holder continuous gradient. In Section |4j we provide different variants 
of the bundle-level type methods for solving the aforementioned class of nonlinear programming. In 
section [5] we show some numerical illustration of implementing the above-mentioned algorithms. 

Notation. For a differentiable function h : M” —» M, h'{x ) is the gradient of h at x. More 
generally, when h is a proper convex function, dh(x) denotes the subdifferential set of h at x. For 
x E and y E R n , (x, y) is the standard inner product in W 1 . The norm || • || is the Euclidean norm 
given by ||a;|| = y / (x, x), while ||®||g = \J {Gx, x) for a positive definite matrix G. Moreover, we let 
B(x, r ) to be the ball with radius r centered at x i.e., B(x, r) = {x € | \\x — x|| < r}. We denote 

I as the identity matrix. For any real number r, \r~\ and denote the nearest integer to r from 
above and below, respectively. denotes the set of nonnegative real numbers. 


2 Unified accelerated gradient method 

Our goal in this section is to present a unified gradient type method to solve problem when / 
is Lipschitz continuously differentiable. This method automatically carries the optimal theoretical 
convergence rate without explicitly knowing / in (|1.1[) is convex or not. Compared with the optimal 
accelerated gradient method presented in m, our algorithm does not need the uniform boundedness 
assumption on the iterates generated by the algorithm. Throughout this section, we assume that the 
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gradient of / in (11.11) is L-Lipschitz continuous on X , i.e., (11.31) holds, which consequently implies 


\f{y ) - f{x) ~ {- x)\ < ^||y - x\\ 2 Vx,y E X. 
The unified accelerated gradient (UAG) algorithm is described as follows. 


( 2 . 1 ) 


Algorithm 1 The unified accelerated gradient (UAG) algorithm 

Input: Xq E X, { a k } s.t. oq = 1 and a k E (0,1) for any k > 2, {Afc > 0}, {j3 k > 0} and { 7 *. > 0}. 
0. Set the initial points Xq 9 = xo and k = 1. 

1. Set 


x™ d = (1 - a k )x a k 9 _ 1 + a k x k - 1 . 


2. Compute f'(x™ d ) and and set 

x k = argminj (f'(x k ld ),u) + x k -i\\ 2 + X(u)\ , 

uex l 2Afc J 

x a k 9 = (1 - a k )x a k 9 _ 1 + 

x a k 9 = argmin|(/ / (x^ 1 ),u) + ^-||u-.T^ 1 || 2 + A(u)|. 


3. 

4. 


Choose x k 9 such that 


Hx a k 9 ) 


Set k E- + 1 and go to step 1. 


min{^(x^),^(^ 9 )}. 


( 2 . 2 ) 


(2.3) 

(2.4) 

(2.5) 


( 2 . 6 ) 


We now add a few remarks about the above UAG algorithm. First, observe that by just consider¬ 
ing (12.21) . (12.31) . (12.41) and setting x k 9 = x k 9 , Algorithm [Tj would reduce to a variant of the well-known 
Nesterov’s optimal gradient method (see, e.g., PH). Moreover, if replacing x k 9 _ 1 by x™ d in (12.51) 
and setting x k 9 = x k 9 , then (12.21) . (12.3|) and (12.51) would give the accelerated gradient (AG) method 
proposed by Ghadimi and Lan [10]. However, when / is nonconvex, the convergence of this AG 
method in m requires the boundedness assumption on the iterates, as mentioned before. On the 
other hand, by just considering (12.51) and setting x k 9 = x k 9 , the UAG algorithm would be a variant 
of the projected gradient method [12]. Indeed, it follows from these observations that we can pos¬ 
sibly perform the convergence analysis of the UAG method for convex and nonconvex optimization 
separately (see the discussions after Corollary [3]). 

Second, relation (I2.6j) guarantees the objective function value at the iterates x k 9 generated by the 
UAG algorithm is non-increasing. Such a monotonicity of the objective function value, as shown in 
the proof of Theorem (2j a), is required to establish convergence of the algorithm when 'k is nonconvex. 

Finally, noticing that X in problem ([1,1ft is not necessarily differentiable and / in (II. 1|) may not 
be a convex function, hence we need to define a termination criterion when the objective function 
\k is not convex. In this case, we would terminate the algorithm when the norm of the generalized 
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projected gradient defined by 


(2.7) 


9x,k 

ag 


aq -aq 

ry* -J _ rtn u 

1 


Pk 


is sufficiently small. Note that g xk = f'(x kl ) when X vanishes and X = M n . Indeed the above 
generalized projected gradient in constrained nonsmooth optimization plays an analogous role to 
that of the gradient in unconstrained smooth optimization. In particular, it can be shown that if 
\\g x ,k\\ ^ e > then € -Nx{x a k 9 ) + B{e(Lf3 k + 1)), where ^'( xl 9 ) € Nx{x a k 9 ) is the 

normal cone of X at x k 9 , and B{r ) := {x € M n : ||a:|| < r} (see e.g., [IT]). 

To establish the convergence of the above UAG algorithm, we first need the following simple 
technical result (see Lemma 3 of m for a slightly more general result). 

Lemma 1 Let {a k } be a sequence of real numbers such that a± = 1 and a k E (0,1) for any k > 2. 
If a sequence {u k } satisfies 


then for any k > 1 we have 


where 


Ldk ^ (1 O k )uj k — 1 T Cfc, ^ — 1)2,... 


wfc < r fc y^(Ci/rj), 


( 2 . 8 ) 


i —1 


T fc := 


1, k = 1, 

(1 - a k )T k _i, k > 2. 


(2.9) 


Below, we give the main convergence properties of the UAG algorithm. 

Theorem 2 Let {x k 9 } be the iterates generated by Algorithmic and T k be defined in \2.9\) . 
a) Suppose that T is bounded below over X, i.e., T* is finite. If {/3 k } is chosen such that 

with f3 k < 2/L for at least one k, then for any N > 1, we have 

'L(xo) - T* 


min II q 
k =l. n"' 


X,k II — 


e£= 


( 2 . 10 ) 


( 2 . 11 ) 


b) Suppose that f is convex and an optimal solution x* exists for problem (QT7J) . If {a k }, {/3k} 
and {Afc} are chosen such that 


and 


then for any N > 1, we have 


Oil 

AiTi 


> 


Ot-2 

a 2 t 2 


> 




( 2 . 12 ) 

(2.13) 


T(x“ 9 ) - T(x*) < 


T n \\x 0 - x *|| 2 
2Ai 


(2.14) 
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Proof. We first show part a). By m, we have 


f(x a k 9 ) < /(Sfcii) + (/'(a:*® r),Xfc S - Zfcli) + - 4-iH 2 - 

Then, it follows from (12.51) and Lemma 2 of |9j that for any x € X we have 


(2.15) 


-*) + ^ ^ [ll*fc-i -*H 2 - -*f - IK-1 -*fc S H 2 ] ■ ( 2 - 16 ) 


Letting x = x k _ x in the above inequality, adding it to ([2.151) and noticing (12.6|) . we obtain 


*&?) < - j k (i 


Lp k 


aq -aq 11 2 

C fc-l-*fc II ' 


(2.17) 


Summing up the above inequalities for k from 1 to N and rearranging the terms, it follows from the 
definition of g xk in (12.71) and T* < T(x^) that 


N 


k= 

N 


f 1 


k =1 


< Eft(>-T II*. 

k =1 ' ' k 

< T(xq S ) - T(x^) < 'L(xo) - T 


2 

N 


2 _ \ J_ A _ L[ik 

x ’ kn ^ '4 V 2 


I aq -aq\\2 

| ^v» «-/ _ /"V* -V II 

l X fc-l II 


(2.18) 


Then, (12.101) and the above inequality imply (12.111) . 

We now show part b). Using (12.41) and the convexity of /, for any x € X, we have 

f(x]? d ) + (f'(xZ d ),xt 9 -xT d ) 

= (1 - a k )[f{xK d ) + (ffx^fxZ, - xT d )} + a k [f(x^ d ) + ( f'(xT d ),x k - xff d )] 

< (1 -a k )f(x a k 9 _ 1 ) + a k f(x) + a k (f\x^ d ),x k ~x), (2.19) 

which together with (12.11) . (12.21) . (12.41) . (12.61) . and the convexity of X imply that 

*w) < *(*?)=f(4 3 )+ u*r) < /w*)+(/'Wur - Xk d )+jiK 3 - *zii 2 +-usd 

< (1 - «fc)/(%li) + «fc/(z) + a k {f'(x’ k d ).Xk -x) + ^\\x k g -xf rf || 2 + (1 - a k )X(x1?_i) + <*kX(xk) 


eu„md\ „ „\ , La ln„ „ II2 


( 2 . 20 ) 


= (1 - a fc )^(a: fc l 1 ) + a fc /(x) + a k (f'(x k ),x k - x) + -^||x fc - x fc _i|| + a k X(x k ). 

Now, by (12.31) and Lemma 2 of [9], for any x € X we have 

( f'{xf d ),x k - x) +X(x k ) < X{x) + [||x fc _i - x|| 2 - ||x fc - x|| 2 - ||x fc - x fc _i|| 2 ] . (2.21) 

lA k 

Multiplying the above inequality by a k and summing it up with (12.201) . we obtain 

OL k 


®(x k ) < (1 — a k )^>(x k 9 _ 1 ) + a k ^>(x) 4 


2A k 


11 2 ii 112n c^fc( 1 La k X k ) .. 11 2 

Xfc_i-x|| -||x fc -x|| J- — -||Xfc-Xfc_l|| , 


7 

















( 2 . 22 ) 


which together with the assumption (|2.12l) give 


^(x a k 9 ) < (1 - otk^ixl 9 ^) + a k 'V{x) + [Hxfc-i - x\\ 2 - \\x k - x|| 2 ] . 

Subtracting 'L(x) from both sides of the above inequality and dividing them by T&, then it follows 
from Lemma Q] that for any x € X we have 


'L(a^) - T(x) 

fv 


< 


N 

E 

k =1 


CH k 




\xk-i - x \\ 2 - \\x k - x|| 2 ] 


(2.23) 


Now, by (12.131) and aq = Ti = 1, we have 

N 


E 

k=1 


ak 


I 112 || ||21 ^ a l\\ x 0 X\ 

| X k —\ - x\\ - \\x k - x\\ \ < 


Airi 


\Xq — X| 
Ai 


(2.24) 


which together with (12.231) give 


X ) < ||x 0 -x 


Tn 2Ai 

Setting x = x* in the above inequality directly gives (12.141) . ■ 

Note that the convergence analysis of the UAG method is completely separable for convex and 
nonconvex problems in the above proof. This allows us to solve the optimization problems uniformly 
without having any information about the convexity of /. In the next corollary we specify one 
particular set of choices of the parameters in the UAG algorithm to obtain a particular convergence 
rate. 

Corollary 3 Suppose that {a k }, {/3k} and {A*,} in Algorithm^ are set to 

2 


oik = 


k+ 1 


Afc = and (3 k = 


(2.25) 


a) Suppose that T is bounded below over X, i.e., T* is finite. Then for any N >1, we have 

(2.26) 


. „ ||2 ^ 2L[T(x 0 ) - 4/*] 

mm \\q v , < -, 

k=L...,N yx ' k N 


b) Suppose that f is convex and an optimal solution x* exists for problem 11.11 ). Then for any 
N > 1, we have 


T(x^f) - ^(x*) < 


* 112 


2L||xo — x 
N(N + 1) 


(2.27) 


Proof. We first show part a). Observe that by (|2.25|) condition (12.101) holds. Then, (|2.26l) directly 
follows from (12.111) and 

N 




(, L(3 k \ N 






























We now show part b). Observe that by (12.251) . we have 


= (tTIjl <= I and 

Q?1 _ OL 2 _ — 9^ 

“ A 2 r 2 > 

which imply that conditions (|2.12l) and (12.131) hold. On the other hand, by (12.91) and (12.251) . we have 

r 2 
N N(N + 1) ’ 

which together with (12.141) clearly imply (12.271) . ■ 

We now add a few remarks about the results obtained in Corollary [3l First, note that the UAG 
method achieves the best known convergence rate for solving nonconvex optimization problems as 
well as convex optimization problems. Specifically, (12.261) implies that to achieve ||(? xi .|| 2 < e for at 
least one k, the total number of iterations needed by the UAG method is bounded by 


O 


LMx 0 ) - ** 


(2.28) 


This bound is also known for the steepest descent method for unconstrained problems (21], and 
the projected gradient method for composite problems in Ghadimi, Lan and Zhang [12]. A similar 
bound is also obtained by the AG method m, which however, for composite problems, relies on an 
additional assumption that the iterates are bounded as mentioned before. On the other hand, one 
possible advantage of this AG method in m exists in that it can separate the affects of the Lipschitz 
constants of smooth convex terms. When / is convex, by (12.271) . the UAG method guarantees to 
find a solution x such that 'L(x) — \k(x*) < e in at most 


O 



(2.29) 


iterations, which is known to be optimal for solving convex optimization problems [22j . 

Second, the UAG method does not need to know the convexity of the objective function as a 
prior knowledge. Instead, it treats both the convex and nonconvex optimization problems in a unified 
way. In any case, the UAG method always achieves the complexity bound in (12.28[) . And when the 
objective function happens to be convex, it would also achieve the optimal complexity bound in 

CTD - 

Despite the above mentioned theoretical advantages for the UAG method, there are still some 
practical drawbacks of this method. One obvious drawback of the UAG method is that the parameter 
policy in (12.2511 requires the knowledge of the Lipschitz constant L which may not be exactly known 
in practice. And a poor estimate of this Lipschitz constant may severely deteriorate the performance 
of the method m- On the other hand, in many applications the Lipschitz continuity of the gradient 
of / in (11.11) may not be known either. And in fact, the gradient of / may be only holder continuous 
instead of Lipschitz continuous. Furthermore, we can see from the convergence analysis of the 
case when / is not a convex function that the UAG method will perform more like the steepest 
descent method. However, the steepest descent method, although very robust, is usually not an 
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efficient method as verified in many applications of nonlinear programming. In the next section, 
we would like to modify the UAG method to a more practical method so that it could be applied 
to a broader class of problems and be more flexible to take advantage of some already well-studied 
efficient methods in nonlinear programming. In addition, no prior knowledge of problem parameters 
is needed. 


3 Unified problem-parameter free accelerated gradient method 


In this section, we consider a broader class of problems including smooth, weakly smooth and non¬ 
smooth objective functions. In particular, we would like to deal with the class of problems in (11.11) 
such that the gradient of / is Holder continuous in the sense of (11.21) . which also implies 

\f(y) ~ f(x) - {f'(x),y - x)\ < — \\y-x\\ 1+v for any x,y e X. (3.1) 

1 + v 

Our algorithm is stated below as Algorithm [2] which involves two line search procedures. For this 
algorithm, we have the following remarks. 

First, note that in steps 1 and 2, we implement two independent line search procedures, respec¬ 
tively, in (13.41) and (13.71) . Indeed, we start with initial choices of stepsizes A& and (3k, and then perform 
Armijo type of line searches such that certain specially designed line search conditions are satisfied. 
We will show in Theorem |4ja) that the two line search procedures will finish in finite number of inner 
iterations. One simple choice of line search in practice is to set Gk = I all k > 1, and set the initial 
stepsizes to be some Barzilai-Borwein type stepsizes such as 


Afc = max 



for k > 1, 


and (3k = nrax 


(Cl-Cl) 1 

<Ci>Ci>’7 


for k > 1, 


(3.2) 


where a md — 'r md — ni md — f' l ^r md \ — f'('r ag a ag — _ “9 “9 _ “9 \ _ 

wnere s k _ x - x k x k _ v y fc _ 1 - j {x k ) j \x k _ l ), s k _ k - x k _ 1 x k _ 2 ana y fc _ 1 - / ( i x fc _ 1 j 
f\x a k 9 _ 2 ). And we can choose (3\ = 1 /H, where H is an estimation of the Holder continuity constant 
in flfzp . 

Second, we can include some curvature information of / into a positive definite matrix Gf ; in (13.81) 
to have better local approximation of the function T at x a k _ v In this case, unit initial stepsize is often 


preferred, that is to set (3k = 1 for all k > 1. In practice, we can set Gk to be some Quasi-Newton 
matrix, e.g., the well-known BFGS or limited memory BFGS matrix (see e.g., [112312S])- Then, when 
X = M n and X(x) = 0, (13.81) will be exactly a Quasi-Newton step and hence, a fast local convergence 
rate could be expected in practice. When X ^ M n or X{x) ^ 0, we may not have closed formula 
for the solution of the subproblem (13.81) . Then, the alternating direction method of multipliers or 
primal-dual type algorithms could solve the subproblem ()3.8j) quite efficiently, since its objective 
function is just a composition of a simple convex function X and a convex quadratic function with 
known inverse of the Hessian. So, in general, by different choices of the matrix Gk-, many well-studied 
efficient methods in nonlinear programming could be incorporated into the algorithm. 
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Algorithm 2 The unified problem-parameter free accelerated gradient (UPFAG) algorithm 

Input: xo € X, line search parameters 0 < 7 < a < 1 , 71,72 € (0,1), and accuracy parameter 
5 > 0. 

0. Set the initial points x^ 9 = xo and k = 1. 

1. Choose initial stepsize A*, > 0 and find the smallest integer T\ k > 0 such that with 

rjk = A kll 1,k and X k = (rj k + \J V k + 4r?fcAfc_i)/2, (3.3) 

the solutions obtained by (12.21) . (12.31) and (12.41) satisfy 

f{x a k ) < f(x™ d ) + a k (f'(x , k d ),x k - x k -i) + ^~\\x k - x k _i\\ 2 + 5a k , (3.4) 

ZA k 

where 

k 

a k = A fc /A fc and A fc = ^A fc . (3.5) 

i =1 

2 . Choose initial stepsize (3 k > 0 and find the smallest integer > 0 such that with 

Pk = Pkl?' k , (3.6) 

we have 

m*7) < - ^II7 9 - 4-i li 2 + (3-7) 

where 

x a k 9 = argmin | (/ / (x^ 1 ), ix) + ^-\\u - + A(u)| , (3.8) 

and G k &I for some a € (0,1). 

3. Choose x a9 such that 

^{x a k 9 ) = min{'F(x^ 1 ), T(x^), T(x^ 9 )}. (3.9) 

4. Set k <— k + 1 and go to step 1. 


Third, from the complexity point of view, instead of setting the initial stepsizes given in (13.21) . 
we could also take advantage of the line search in the previous iteration and set 

A fc = r?fc_ 1 and j3 k = Pk-i, (3-10) 

where r) k _ 1 and /3 k -i are the accepted stepsizes in the previous k — 1-th iteration. The choice of 
initial stepsizes in (13.21) is more aggressive and inherits some quasi-Newton information, and hence, 
could perform better in practice. However, the strategies in (13.101) would have theoretical advantages 
in the total number of inner iterations needed in the line search (see the discussion after Corollary [S]). 
Furthermore, notice that the choice of a k can be different than the one in (13.51) . In fact, we only 
need to choose a k such that condition (j2. 12|) is satisfied. For simplicity, we use the choice of a k in 
(13.41) . which would satisfy the condition X\a k = X k T k due to the definition of T k in (12.91) . We can 
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easily see that by this choice of ak we always have oq = 1 and ctk € ( 0 , 1 ) for all k > 2 . 

Finally, (13.91) has one more extra term \H(:e?_ 1 ) in the minimum argument than (12.61) . This 
extra term is designed to guarantee that 'P(x'?; 9 ) is monotonically non-increasing. Note that since 
we assume the stepsizes in Algorithm |T| are set in advance, we do not need such an extra term. In 
particular, in Algorithm |T] is non-increasing due to (12.171) . 

Below, we present the main convergence properties of the UPFAG algorithm. 


Theorem 4 Let {x k 9 } be the iterates generated by Algorithm [H and T /. be defined in 112.9\) . 


a) The line search procedures in Step 1 and Step 2 of the algorithm will finish in finite number of 
inner iterations. 

b) Suppose that f is bounded below over X, i.e., T* is finite. Then, for any N > 1, we have 


min II . || 2 < 

fc=i,...V ’ - 


'P(xo) - T* + X)fc=|jV/2j+l 


.-1 


7 E fe = L AT/2j + l Pk 
where g x k = — x k 9 )//3k and x a k is the solution of 113.81) . 


(3.11) 


c) Suppose that 'P is convex and an optimal solution x* exists for problem (Ob Then for any 
N > 1, we have 


*{x a °)-*{x*) < 


r iv ||cc 0 - x 

2 Ai 


* 112 


+ 5. 


(3.12) 


Proof. We first show part a). By (13.11) . we have 


H4 9 ) < n< 9 - 1 ) + Wi).^ - <-i) + 

Analogous to (I2.16p . by ()3.7j) and (13.8p . we can show 


H 

1 + v' 


I ag iil+i; 


(3.13) 


1 


„ a 9 = a 9 II2 


(/'K-1), - x ) + x( x T) <*( x ) + ^ [\K-i - x \\l k - II x T - x \\c k - \\ x k-i - x k iiGfcj 

Letting u = x a k _ x in the above inequality and summing it up with (|3.13p . we have 

|™“9 ag 112 


*W) < ^K-i) - 


I rv.~~& _ rr-c ZT ||™ a 9 _ „, a 9 ||l + l' 

\ X k X k-l\\G k 11 \ X k X k -ill 


Pk 


l + U 


(3.14) 


Now, for v € [0,1), it follows from the inequality ab < aP/p + b q /q with p = jrp, q = , and 


H 


a = 


that 


1 + v 
H 

l + v 


(1 — v)k 


\ X k 9 — x k 9 \\ 1+l/ an d & = 


(1 — n)k 


1 - 1 / 

2 


\ x k 9 ~ X< k\\ 1+V = ab < L{n,H)k 1 +" \\x a k 9 - x^f || 2 + -j-, 


(3.15) 
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where 


L(y,H) = < 


H 


(i+O 

l-v 


2 

l+f 


(3.16) 


Let us define 


L(l,H) = limL(u,H) = ^. 

is—>± Z 


(3.17) 


Then, (13.151) holds for all v E [0,1]. Combining (13.141) and (13.151) . we have from G k cl that 

cr - L(i/, H)k T +^/3 k 


J k > — * 

Also, by (13.11) . ( 12 . 21 ) . and (12.41) . we have 


*(®fc s ) ^ ^K-i)-—— n*r - *fc-ir + - 


Pk 


1 

fc‘ 


(3.18) 


nx k 9 ) < f(xn+(f\xr),x k 9 -xn + 

jmd 


H II z. a 9 „md\\l+u 

1 + J k k " 


rr l+i' 

/(x™ d ) + a k {f'(x™ d ),x k - x k -i) + - r -^— \\x k - x fc _i| |1+i/ 


1 + i/ 


= f{xk d ) + ot k {f{x k d ),x k - x fc _i) - 


t ( nn ,md\ 


\%k %k—1 | 

2A fc 


l + i/ 




ii+k 


, a k n II2 

+ 77T-Il x ’fc “ x fc-lll 

lA k 


< /(sr)+«fc(/(*D.*fc-*fc-i)- 

. a k || II2 I c 

+ 77T - |pfe — 1|| + OQifc, 

ZA k 


a k 1 - 2L(z/, H)al +V X k 5 ||x fc - z fc _i 


2A fc 


(3.19) 


where the last inequality is obtained similar to (13.151) and L(u,H) is defined in (13.161) and (13.171) . 
Now, observe that if 




l-v 

<51+1- 


2L(u, H ) 


and 


& < 


t/ —1 

( 2 cr — 7 )A; 1 + 1 ' 
2L{y, H) ’ 


(3.20) 


then (13.181) and (13.191) . respectively, imply (13.71) and (13.41) . By (13.31) and our setting of = X k /A k = 
X k /(X k + Afc-i), we have a k X k = Hence, 


aj+ v X k = (a k X k )^xl +: = r^ xf ". (3.21) 


By (13.31) . (13.61) and 71,72 € (0,1), we have 


lim rj k = 0 , lim A*, = 0 and lim j3 k = 0 , 

Ti,fc->°o %->0 T 2 , fc ->oo 

for any fixed k, which together with (13.211) imply (13.201) will be finally satisfied in the line search 
procedure and therefore, (13.71) and (13. 4p will essentially be satisfied. So the line search procedures in 
Step 1 and Step 2 of Algorithm [ 2 ] are well-defined and finite. 


13 




































We now show part b). Noting (13.71) . (13.91) . and in view of (12.71) . we have 

< *( 4 9 ) < ^K-i) - 


I _ ag 112 


+ \ = ®(*£,) - Trl 


1 

+ p 


2/3 k ' k ~ v fc " iy 2 

Summing up the above inequalities for k from |_2V/2j + 1 to N and re-arranging the terms, we obtain 

N N N 

ll~ II 2 \ 7 Ac , • ii- 112 \ 7 Pk - \ 

mm ll«. .11 ^ mm Ilfe.JI L ~ S E 


7 Pk, 


k=l,2,...,N 


k=[N/2\+l 


k=[_N/2\+l,2,...,N 


k=[N/ 2J+1 
TV 


fc=LTV/2j+l 


< \h(x ag 


Liv/ 2 j 


TV TV 

)-®(*Sf)+ E r < *(*o)-*(**) + E r> 


(3.22) 


fc=LTV/2j+l fc=LTV/2j+l 

where the last inequality follows from (13.91) and the fact that 1 J / * < d)'(x^). Dividing both sides of 
the above inequality by 7 EaI|tv/ 2 J+i 21 we clearly obtain (13.111) . 

We now show part c). By (12.21) . and (12.41) . and (13.41) . we have 


/K”) < /(^r 1 )+- »n) + + 

= f«) + (/K“W -47 + atl|lt 2 ~^- ll|Z + fa t . (3.23) 

Combining the above inequality with (13.91) and noticing the convexity of X , similar to (12.201) . for any 
x £ X we have 

'I'Odv 7 ) < (1 - + a k f(x ) + a k (f'(x™ d ), x k - x) + ~ ®fc-i || 2 + a k X(x k ) + 5a k . 

2A k 

Adding the above inequality to (12.211) (with its both sides multiplied by a k ), we have 

^(xl 9 ) < (1 - a fe )^'(^ 9 _ 1 ) + Qfc'I'(aT) + H [||x fc -i - x\\ 2 - \\x k - x|| 2 ] + 5a k . (3.24) 

^A k 

Also, note that by (|3.5|) and (I2.9p . we can easily show that 

T k = —t —— and -—— = — Vfc > 1. (3.25) 

Ell A2 A fc r fc Ai — 

Subtracting ’I'(x) from both sides of (|3.24l) . dividing them by T k , then it follows from Lemma [T] that 
for any x G X we have 


*04 g ) - *(x) 

r N 


< 


< 


TV 

E 

k=l 


Ot k 


TV 


2X k T k 


x k -i - x\ 


k =1 


\x k -x\\ 2 ] 


|tco — x \\ 2 5 

2\i + rv 


where the second inequality follows from (13.251) and the fact that 


TV 


k=1 


k =2 


rT “ r7 ^T k l 


r k 

Tfc-i 


fT + E(£ 

1 k=2 V ^ 


Tfc-i 


1 

rV 


(3.26) 


(3.27) 
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Then, (13.121) follows immediately from (13.261) with x = x*. m 

In the next result we specify the convergence rates of the UPFAG algorithm. 

Corollary 5 Let {xbe the iterates generated by Algorithm 0 Suppose there exist some constants 
A > 0 and /3 > 0 such that the initial stepsizes Xk > A and (3k > (3 for all k > 1. 


a) Suppose that T is bounded below over X, i.e., T* is finite. Then, for any N >1, we have 

(3.28) 


■ „ 2 / 8[^(® 0 ) - V* + 1] 

k=l,...,N 72 


8L(u,H) 1 

P N 


b) Suppose that f is convex and an optimal solution x* exists for problem ll.il ). Then for any 
N > 1, we have 




* 112 


4||z 0 - x 


l+3i/ 

71 TV n-*' 


2 L{y,H) 1 

1 - 1 / ' \ 
^1 + 1/ 'V 


+ <5. 


Proof. Since Xk > A and (3k > (3 for all fc > 1, then it follows from 
Vk &kXk that 


(3k > 72 min <j 

which together with 


(2cr — 7 )& 1+I/ 
2 L(v,H) 


2v 

, (3 ^ and al +,/ A& > 71 min 


l-v 

S l + V 

--, Aa +! ' 


2 L{u,H) 


k ( ’ 


(3.29) 
, (ETMD and 

(3.30) 


JV 


Y ^ < 

k=[N/2\+l 


rN+1 

J x= I N/: 


x=IN/2}+1 


1-1/ 2 

x 1 + v dx < AN 1 +-', 


* 

£... i * I 


k=\N/2\+l 

and arithmetic-harmonic mean inequality imply that 

N N 

Y @ k - Y 72 min 


N dx N 

— = In ———- < 1 , 

x=[N/2\ x [N/2\ 


(3.31) 


k=[N/2\+l fc=[W/2j+l 


( 2 er — 7 )k H-" 
2L(v, H) 


,P>> 


72 N 2 


4 Efc=LiV/2j+l maX 


2 L(u,H) 1 

y-i ■ 8 
(2it— "f)k 1 + 1 ' 


> 


_ A2N 2 _ _72_ 

4 Ef= L iv/2j + i {2(2cr — 7)-^(G ff)W + /3-1 } “ 4 (8(2a - 7)"^(^ # )^ + (W" 1 ) ’ 


(3.32) 


Combining the above relation with (13.111) . we clearly obtain (13.281) . 

Now, observing (13.301) and the facts that oik € (0,1] and v € (0,1], we have 


2v 

al + " X k > 71 min ■ 


1-v 

b 1 + I/ 


2 L(v,H) 


,A 
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which together with (13.251) imply that 


Afc > 



2v 

l+3i/ 1 + t/ 

l+3i/ 

mm 


/ <5T+S \ 1 + s *' 

\2L(u,H)J 


l±v 
\ l+3v 


Noticing this observation, defining c = then by (12.91) and (13.251) . we have 

j_k_ = j_ _ (t-gt ) 1 > = of . J y 

n n n ~ n ^ ~ 


where the first inequality follows from the fact that 1 — (1 — a) c > ca for all a € (0,1] and c € [1/2,1]. 
By the above inequality and noticing To = 1, similar to (|3.32|) . for any N > 1 we obtain 

1 cffN 2 > cqflV 

^ “ KEk=i{l 2L (v,H)6^Y + \-c} ~ Xl{[2L(u,H)5^} c + X-y 

which together with the facts that c° > 1/4 and (a + &)= < 2(a= + b°) for any a, b > 0, imply that 


r^v < 


8Ai 


l+3v 

71 N n-" 


f 2L(u,H) 1 

1-1/ ' \ 
S i+i/ ^ 


Combining the above relation with (|3.12p . clearly we obtain (13.291) . 


We now add a few remarks about the results obtained in Corollary [5j First, for simplicity let us 
assume /3 > (2<r — 7 )/L(z/, H). Then, by ([3.281) . we conclude that the number of iterations performed 
by the UPFAG method to have ||5 xfe || 2 < e for at least one k, after disregarding some constants, is 
bounded by0 


O 



'P(.To) - 




(3.33) 


Note that when the scaling matrix Gk is also uniformly bounded from above, i.e., Gk A MI for 
some constant M > 0, ||g x J| is equivalent to ||g xfc || defined in (12.71) . Hence, when u = 1 and Gk is 
uniformly bounded from above, the above bound will reduce to the best known iteration complexity 
in (12.281) for the class of nonconvex functions with Lipschitz continuous gradient. 

1 — u 

Second, by choosing 6 = e/2 and assuming that A > 5 1 + i/ /L(u, if), (|3.29j) implies that the UPFAG 
method can find a solution x such that *P(x) — T(x*) < e, after disregarding some constants, in at 
most 


O 


H\\x 0 



\1+V 


2 N 

1+31/ 


(3.34) 


number of iterations which is optimal for convex programming [22] . If u = 1 the above bound 
will reduce to (12.29[) obtained by the UAG method for the class of convex functions with Lipschitz 

1 This complexity bound was also derived for the gradient descent method as a homework assignment given by the 

second author in Spring 2014, later summarized and refined by one of the class participants in m- However this 
development requires the problem to be unconstrained and the paramers H and v given a priori. 
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continuous gradient. Note that (13.341) is on the same order to the bound obtained by the universal 
fast gradient method proposed by Nesterov [26j for convex optimization problems. 

Finally, it is interesting to find the number of gradient computations at each iteration of Algo¬ 
rithm [2] assuming the initial stepsizes A*, > A and (3 k > (3 for all k > 1. According to (13.61) and (13.301) . 
we conclude that, after disregarding some constants, r 2tk < log k which implies that the number of 
gradient computations at point xU, is bounded by log k. Similarly, we obtain that the number of 
gradient computations at point x™ d is also bounded by log A;. Hence, the total number of gradient 
computations at the k -th iteration is bounded by 2 log A;. On the other hand, suppose that we choose 
(3 k and Xk according to (13.101) . Then, we have 

n,k = (log rj k - log rjk-i)/ log 7i and r 2 ,k = (log ( 3 k ~ log ( 3 k -i)/ log72- 
So the number of gradient evaluations in Stepl and Step 2 at the Ar-th iteration is bounded by 


1 + n,fc = 1 + (log rjk - log rj k _ i )/ log 7 i and 1 + r 2)k = 1 + (log (3 k - log (3 k -i)/ log 72, 


which implies the total number of gradient evaluations in Stepl and Step 2 is bounded by 

AT A7 , AT log Wn ~ log Vo J AT Ar , v- AT , log (3 N - log (3 0 

N v = N + 2_^ n,k = A H-- and Ng = N + 2_^ x 2yk = N + ■ 


k =1 


k =1 


log 72 


Note that (13.301) implies N.^ < N + c\ and Ng < N + c 2 logN for some positive constants ci and c 2 . 
Hence, the above relations show that the average number of gradient computations at each iteration 
is bounded by a constant, which is less than the aforementioned logarithmic bound log A; obtained 
for the situations where (3 k and X k are chosen according to (|3.3I) and (13.61) . However, in (|3.3I) and 
(13.61) the algorithm allows more freedom to choose initial stepsizes. 

One possible drawback of the UPFAG method is that we need to fix the accuracy 5 before 
running the algorithm and if we want to change it, we should run the algorithm from the beginning. 
Moreover, we need to implement two line search procedures to find (3 k and X k in each iteration. In 
the next section, we address these issues by presenting some problem-parameter free bundle-level 
type algorithms which do not require a fixed target accuracy in advance and only performs one line 
search procedure in each iteration. 


4 Unified bundle-level type methods 

Our goal in this section is to generalize bundle-level type methods, originally designed for convex 
programming, for solving a class of possiblely nonconvex nonlinear programming problems. Specif¬ 
ically, in Subsection 14.11 we introduce a unified bundle-level type method by incorporating a local 
search step into an accelerated prox-level method and establish its convergence properties under the 
boundedness assumption of the feasible set. In Subsection 14.21 we simplify this algorithm and pro¬ 
vide its fast variants for solving ball-constrained problems and unconstrained problems with bounded 
level sets. 

4.1 Unified accelerated prox-level method 

In this subsection, we generalize the accelerate prox-level (APL) method presented by Lan [I6l| to 
solve a class of nonlinear programming given in (11.11) . where / has Holder continuous gradient on 
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X. Lan m showed that the APL method is uniformly optimal for solving problem (11.11) when / 
is convex and satisfies (USD- Here, we combine the framework of this algorithm with a gradient 
descent step and present a unified accelerated prox-level (UAPL) method for solving both convex 
and nonconvex optimization. 

As the bundle-level type methods, we introduce some basic definitions about the objective func¬ 
tion and the feasible set. We first define a function h(y,x ) for a given y € X, as 

h(y,x) = f(y) + (f'(y),x-y) + X[x) for any x € X. (4.1) 

Note that if / is convex, then we have h(y,x) < f(x) + X(x) = 'P(x) and hence h(y,x ) defines a 
lower bound for \k(x). Also, let <S$(Z) be the level set of T given by Sq,(l) = {x € X : \k(x) < 1} and 
define a convex compact set X' as a localizer of the level set S\y (l) if it satisfies C X' C X. 

Then, it can be shown [16] that, when T is convex, min {l,h(y)} < T (x) for any i£l, where 

h(y) = min {h(y,x) : x € X'}. (4.2) 

Using the above definitions, we present a unified accelerated prox-level (UAPL) algorithm, Algo¬ 
rithm [3l for nonlinear programming. We make the following remarks about this algorithm. 

First, note that the updating of x'ff in step 3 of the UAPL algorithm is essentially a gradient 
descent step, and hence without this update, the UAPL algorithm would reduce to a simple variant 
of the APL method in US] for convex programming. However, this update is required to establish 
convergence for the case when the objective function is nonconvex. Second, this UAPL algorithm 
has two nested loops. The outer loop called phase is counted by index s. The number of iterations 
of the inner loop in each phase is counted by index t. If we make a progress in reducing the gap 
between the lower and upper bounds on T, we terminate the current phase (inner loop) in step 4 
and go to the next one with a new lower bound. As shown in m, the number of steps in each phase 
is finite when T is convex. However, when \k is nonconvex, T is not necessary a lower bound on 
\k, and hence the termination criterion in Step 4 may not be satisfied. In this case, we could still 
provide the convergence of the algorithm in terms of the projected gradient defined in ()2.7|) because 
of the gradient descent step incorporated in step 3. 

The following Lemma due to Lan m shows some properties of the UAPL algorithm by general¬ 
izing the prox-level method in |3]. 

Lemma 6 Assume that T is convex and bounded below over X, i.e., T* is finite. Then, the following 
statements hold for each phase of Algorithmic 

a) {XI} t >o is a sequence of localizers of the level set S^(l). 

b) < jLi < • • • < % < < • • • < for any t> 1. 

c) 0 / X_ t C X t for any t > 1 and hence, Step 5 is always feasible unless the current phase 
terminates. 
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Algorithm 3 The unified accelerated prox-level (UAPL) algorithm 

Input: pq £ X, a t € (0,1) with a\ = 1, and algorithmic parameter r\ £ (0,1). 

Set pi £ Argmm xGX h{p 0 ,x), lbi = h(p 0 ,pi), Xq 9 = p 0l and k = 0. 

For s = 1,2,...: 

Set Xq 9 = p s , To = T(xq 9 ), T 0 = lb s , and l = ??T 0 + (1 — ?y)To. Also, let xo € X and the initial 
localizer X$ be arbitrarily chosen, say xo = p s and Xq = X. 

For t = 1,2,...: 

1. Update lower bound: set x™ d = (1 — at)x^ 1 + atXt-i and T t := max {T ( _ 1 , min{7, h t }}, 
where h t = /t(^t™ d ) is defined in (14.21) with X' = X' t _ x . 

2. Update the prox-center: set 


x t = argmin lG y '_ 1 { \\% ~ ®o || 2 : h{x™ d , x) < ZJ . (4.3) 

If (14.31) is infeasible, set xt = x“ 9 x . 

3. Update upper bound: set k <— k + 1 and choose x “ 9 such that T(x“ 9 ) = 
min{T(x“ 9 1 ), T(x“ 9 ), T(x^ 9 )}, where x “ 9 = (1 — at )x“ 9 x + atxt and x ^ 9 is obtained by 
Step 2 of Algorithm [2j Set T t = T(x“ 9 ) and x ^ 9 = x“ 9 . 

4. Termination: If < T* and T* — < [1 — I min{?y, 1 — 77 }] (To — T 0 )> then terminate 

this phase (loop) and set p s +i = x “ 9 and lb s +i = T ( . 

5. Update localizer: choose an arbitrary X' t such that X_ t C X[ C Xf, where 

X t = |x £ X , t _ 1 : h(x™ d ,x) < Z j and X t = {x € X : (xt — xq,x — xf) > 0} . (4.4) 


End 

End 


Now, we can present the main convergence properties of the above UAPL algorithm. 


Theorem 7 Let the feasible set X be bounded. 


a) Suppose T is bounded below over X , i.e., T* is finite. The total number of iterations performed 
by the UAPL method to have ||< 7 X J | 2 < e for at least one k, after disregarding some constants, 
is bounded by H3.33\) . 


b) Suppose that f is convex and an optimal solution x* exists for problem \1.1\) . Then, the number 
of phases performed by the UAPL method to find a solution x such that T(x) — T(x*) < e, is 
bounded by 


5(e) 


max 


o, log! 

Q 


H max^ygx \\x - y\\ 
(1 + v)e 



(4.5) 


where 


q = 1 — — min{?y, 1 — 77 }. 


(4.6) 
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In addition, by choosing at = 2/(f + 1) for any t > 1, the total number of iterations to find the 
aforementioned solution x is bounded by 


S(e) + 


1 

2 

1 — q 3 ^+i 


( 4\/3 H 


ma yL x ,y&x \\x 


i]9(l + v) 



(4.7) 


Proof. First, note that part a) can be established by essentially following the same arguments 
as those in Theorem UJb) and Corollary [SJ a). Second, due to the termination criterion in step 4 of 
Algorithm [3j for any phase s > 1, we have 

^(p s +i) - lbs+i < q[^(p s ) ~ lb s ], 

which by induction and together with the facts that lbi = h(po,p\) and lb s < T(x*), clearly imply 

^(p s ) - T(x*) < [T(pi) - h(p 0 ,pi )] q s_1 . (4.8) 

This relation, as shown in Theorem 4 of [TB’j for convex programming, implies (14.51) . Third, (14.71) is 
followed by ()4.5D and pT] Proposition 2, Theorem 3]. ■ 

We now add a few remarks about the above results. First, note that the UAPL amd UPFAG 
methods essentially have the same mechanism to ensure the global convergence when the problem 
( 11.11) is nonconvex. To the best of our knowledge, this is the first time that a bundle-level type method 
is proposed for solving a class of possibly nonconvex nonlinear programming problems. Second, note 
that the bound in (|4.7I) is in the same order of magnitude as the optimal bound in (13.341) for convex 
programming. However, to obtain this bound, we need the boundedness assumption on the feasible 
set X , although we do not need the target accuracy a priori. Third, parts a) and c) of Theorem [7] 
imply that the UAPL method can uniformly solve weakly smooth nonconvex and convex problems 
without requiring any problem parameters. In particular, it achieves the best known convergence 
rate for nonconvex problems and its convergence rate is optimal if the problem turns out to be 
convex. 

Finally, in steps 1 and 2 of the UAPL algorithm, we need to solve two subproblems which can be 
time consuming. Moreover, to establish the convergence of this algorithm, we need the boundedness 
assumption on X as mentioned above. In the next subsection, we address these issues by exploiting 
the framework of another bundle-level type method which can significantly reduce the iteration cost. 


4.2 Unified fast accelerated prox-level method 

In this subsection, we aim to simplify the UAPL method for solving ball-constrained problems and 
unconstrained problems with bounded level sets. Recently, Chen at. al [6] presented a simplified 
variant of the APL method, namely fast APL (FAPL) method, for ball constrained convex problems. 
They showed that the number of subproblems in each iteration is reduced from two to one and 
presented an exact method to solve the subproblem. 

In this subsection, we first generalize the FAPL method for ball-constrained nonconvex problems 
and then discuss how to extend it for unconstrained problems with bounded level sets. It should 
be mentioned that throughout this subsection, we assume that the simple nonsmooth convex term 
vanishes in the objective function i.e., X = 0 in (11.11) . 
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Below we present the unified FAPL (UFAPL) method to sovle the problem (11.11) with the ball 
constraint, i.e., X = B(x,R). 


Algorithm 4 The unified fast accelerated prox-level (UFAPL) algorithm 
Input: po G B{x, R ) and 77 ,6 G (0,1). 

Set pi G Argmin xeB(ijii) /i(po,a;), lbi = h(p 0 ,pi), x a 0 9 = p 0 , and k = 0. 

For s = 1,2,...: 

Set Xq 9 = p s , To = T(xq 9 ), T 0 = lb s , l = 7/T 0 + (1 — 77 ) 4 ( 0 , and Xq = X = M n . Also, let 
xo G B(x,R) be arbitrary given. 

For t = 1,2,...: 

1. Set x™' d = (1 — Ot)x“£j + a t x t -i and define X_ t as in (14.41) . 

2. Update the prox-center: Let xt be computed by m with xo = x, i.e., xt = 
argmin xg _ Y J|x — x|| 2 . 

3. Update the lower bound: set A: •(— k + 1 and choose x“ 9 such that T(x“ 9 ) = 
min{T(x“ 9 1 ), T(x^ 9 )}, where x^f is obtained by Step 2 of Algorithm [2] with X = B(x,R). 
If X_ t = 0 or ||xt — x|| > R, then terminate this phase with x^ 9 = x“ 9 , p s +i = x“ 9 , and 
lb s . 1 — l. 

4. Update upper bound: let x “ 9 = (1 — «t)x “£ 1 + cxtXt- If T(x“ 9 ) < T(x“ 9 ), then set 
x a k 9 = x a t 9 = x “ 9 and T* = T(x“ 9 ). If < l + #(To — Z), then terminate this phase (loop) 
and set p s +i = x and lb s+ i = lb s . 

5. Update localizer: choose any polyhedral X[ such that X_ t C X[ C Xt, where X_ t and Xt 
are defined in (14.41) with X = M n . 

End 

End 


We now add a few remarks about the above algorithm. First, note that we do not need to solve 
the subproblem (14.21) in the UFAPL method. Moreover, the subproblem (14.31) in the UFAPL method 
is to project x over a closed polyhedral. There exist quite efficient methods for performing such a 
projection on a polyhedral (see e.g., EJEO). When Gk = I, the subproblem associated with finding 
xl 9 in step 3 of the UFAPL method has a closed-form solution. Hence, in this case there is only one 
subproblem to be solved in each iteration of the UFAPL method and this subproblem can be solved 
quite efficiently. 

Second, note that the UFAPL algorithm can be terminated in step 3 or step 4. Moreover, 
by combining the convergence results in [6j and applying similar techniques used in showing the 
Theorem [H we can establish complexity results similar to the Theorem [7] for the UFAPL method. 
For simplicity, we do not repeat these arguments here. 

Finally, we can extend the above results for the UFAPL method to unconstrained problems. In 
particular, suppose the level set 


S 0 = {xGR n |T(x) < T(x 0 )}, 


is bounded, where xq is the initial point for the UFAPL method. Now, consider the ball-constrained 
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problem 


min mix), (4.9) 

x£B(x 0 ,R) 

such that R = max x . J/e< s 0 ||x — y\\ + 5 for a given 5 > 0. 

To solve this problem, we could apply the UFAPL method with small modifications. Specifically, 
we use X = R n to find x^f in step 3 of this method. Now, let {a^ 9 }fc>i be generated by this modified 
UFAPL method. By steps 3 and 4 of this method, we clearly have m(x^ 9 ) < 'F(x^ 1 ) for all k > 1, 
which implies that x'Jf E So for all k > 1. Hence, all generated points {a^ 9 }fc>i lie in the interior of 
the aforementioned ball B(x o, R) due to 6 > 0, which consequently implies that the optimal solution 
of the problem (14.91) is the same as the that of the unconstrained problem min^gRn 'F(x). Therefore, 
under the boundedness assumption on <So, we can apply the above modified UFAPL method to solve 
the ball-constrained problem (|4.9I) in order to solve the original unconstrained problem. 

5 Numerical Experiments 

In this section, we show the performance of our algorithms for solving the following two problems, 
namely, the least square problem with nonconvex regularization term and the sigmoid support vector 
machine (SVM) problem. 

5.1 Nonconvex regularized least square problem 

In our first experiment, we consider the following least square problem with a smoothly clipped 
absolute deviation penalty term given in |7j: 

1 n 

min T(x) := -\\Ax - bf + mS^p x {\xj\), (5.1) 

IMI<i 2 

where A € R mxn , b E R m , the penalty term p\ : R + —>• R satisfies p\(0) = 0, and its derivative is 
given by 

P i(« = A{/( /? <A) + ^^AA_A/ (/3 >A)} 

for some constant parameters a > 2 and A > 0. Here, /(•) is the indicator function. As it can be seen, 
Px (| • |) is nonconvex and non-differentiable at 0. Therefore, we replace p\ by its smooth nonconvex 
approximation q\ : R + —»• R, satisfying q\(0) = 0 and its derivative is defined by 

Note that (15.11) with the regularization term p\ substituted by q\ fits the setting of the problem 
m, where X = 0 and 'F(x) = f(x) has a Lipschitz continuous gradient. 

In this experiment, we assume that the elements of A are randomly drawn from the standard 
normal distribution, b is obtained by b = Ax+£, where £ ~ 1V(0, cr 2 ) is the random noise independent 
of A, and the coefficient x defines the true linear relationship between rows of A and b. Also, we set 
the parameters to a = 3.7, A = 0.01, and the noise level to a = 0.1. 

We consider three different problem sizes as n = 2000, 4000, and 8000 with m = 1000, 2000, and 
4000, respectively. For each problem size, 10 different instances (6, A, x,£) using the aforementioned 


22 





approach were generated. We implement different algorithms including the UAG, UPFAG using 
partial line search (the stepsizes are set to (13.101) with Ai = /3± = 1/L, where L is an estimation for 
Lipschitz constant of /'), UPFAG with full line search showing by UPFAG-full (the stepsizes are set 
to (13.31) and (|3.6I) with Afc = /§& = 1 /L \/k > 1), UAPL, UFAPL and the projected gradient method 
(PG) described after the presentation of Algorithm [0 Table [T] and Table 0 summarize the average 
results of this experiment over 10 instances for each problem size where the initial point is x$ g = 0. 

The following observations can be made from the numerical results. First, the projected gradient 
method performs the worst among all the compared algorithms. Second, the performances of UAG 
and the variants of UPFAG methods are similar to each other and the type of line search does not 
have a significant affect on the results. One possible reason might be that only a few number of 
line searches are needed when solving this problem. Third, while the UAPL and UFAPL methods 
perform similarly in terms of the iterations and objective value, the former takes significantly longer 
CPU time as expected due to the existence of two subproblems at each iteration. Finally, the bundle- 
level type methods outperform the accelerated gradient methods. This observation has been also 
made for convex problems (see e.g., mm)- 


5.2 Nonconvex support vector machine problem 

In our second experiment, we consider a support vector machine problem with nonconvex sigmoid 
loss function [20], that is 


min 

|ic||<a 


'F(r) 



i= 1 


- tanh(uj(x,Wi))] + 



(5.2) 


for some A > 0. Note that the first term in the objective function is smooth, nonconvex, and has 
Lipschitz continuous gradient. Hence, this problem also fits into the setting of problem m with 
X = 0. Here, we assume that each data point ( Ui,Vi ) is drawn from the uniform distribution on 
[0, l] n x {—1,1}, where Ui € M n is the feature vector and Vi € {—1,1} denotes the corresponding 
label. Moreover, we assume that tq is sparse with 5% nonzero components and Vi = sign ((x, Ui )) for 
some x € M n with ||x|| < a. In this experiment, we set A = 0.01, a = 50, and consider two different 
problem sizes as n = 2000 and 4000 with m = 1000 and 2000, respectively. The initial point is 
randomly chosen within the ball centered at origin with radius a. Similar to the first experiment, 
we report the average results of running different algorithms over 10 instances for each problem 
size. Moreover, in order to further assess the quality of the generated solutions, we also report the 
classification error evaluated at the classifier x given by 

\{i:vi^sign({x,Ui)),i = l,...,K}\ 

er(x) := - — -, (5.3) 

where K = 10000 for each problem instance. Table [3] summarizes the results of this experiment over 
10 instances for each problem size. 

Similar to the previous experiment, it can be seen that the PG method again performs the worst 
among all the compared algorithms and the accelerated prox-level methods also outperform the 
other algorithms. The UFAPL has the best performance in terms of the iteration, runtime and the 
classification error. 
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Table 1: Average required number of iterations ( Iter(k )), runtime (T(s)), and the best objective 
value obtained by different algorithms till finding a solution x* satisfying a desired accuracy for the 
projected gradient over 10 instances of the nonconvex regularized least square problem. 


lls*(r)|| 2 

< 10° 

< 10' 1 

< 10' 2 

< 10' 3 

< 10~ 4 

< 10' 5 




m = 1000, 

n = 2000 





Iter(fc) 

1645 

3011 

> 3324 




PG 

T(s) 

11.7 

21.3 

> 23.7 





*0O 

5.76e+01 

5.73e+01 

5.73e+01 

5.73e+01 

5.73e+01 

5.73e+01 


Iter(fc) 

368 

540 

607 

652 

752 

816 

UAG 

T(s) 

5.2 

7.6 

8.6 

9.2 

10.7 

11.6 


H< 9 ) 

5.73e+01 

5.71e+01 

5.71e+01 

5.71e+01 

5.71e+01 

5.71e+01 


Iter(fc) 

323 

414 

483 

659 

767 

831 

UPFAG 

T(s) 

4.6 

5.9 

6.9 

9.5 

11.1 

12.0 



5.74e+01 

5.73e+01 

5.73e+01 

5.73e+01 

5.72e+01 

5.72e+01 


Iter(fc) 

323 

414 

483 

659 

767 

831 

UPFAG-full 

T(s) 

4.6 

6.0 

6.9 

9.6 

11.2 

12.1 


*W 9 ) 

5.74e+01 

5.73e+01 

5.73e+01 

5.73e+01 

5.72e+01 

5.72e+01 


Iter(fc) 

214 

275 

332 

414 

455 

520 

UAPL 

T(s) 

52.9 

68.7 

83.2 

103.9 

114.3 

130.7 


®K 9 ) 

5.75e+01 

5.74e+01 

5.74e+01 

5.74e+01 

5.74e+01 

5.74e+01 


Iter(fc) 

209 

312 

361 

426 

472 

517 

UFAPL 

T(s) 

4.4 

6.6 

7.6 

8.8 

9.8 

10.7 


*W 9 ) 

5.72e+01 

5.71e+01 

5.71e+01 

5.71e+01 

5.71e+01 

5.71e+01 




m = 2000, 

n -- 4000 





Iter(fc) 

2030 

5340 

> 7000 




PG 

T(s) 

64.4 

170.2 

> 225.7 





*W 9 ) 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 


Iter(fc) 

415 

529 

683 

897 

1190 

1527 

UAG 

T(s) 

26.3 

33.7 

43.3 

57.0 

75.7 

96.9 


H< 9 ) 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 


Iter(fc) 

434 

535 

751 

962 

1253 

1591 

UPFAG 

T(s) 

27.8 

34.1 

47.4 

60.8 

79.4 

101.0 


*K 9 ) 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 


Iter(fc) 

434 

535 

751 

962 

1253 

1591 

UPFAG-full 

T(s) 

28.0 

34.4 

48.2 

61.7 

80.1 

101.4 


n< 9 ) 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 


Iter(fc) 

234 

404 

514 

672 

858 

1013 

UAPL 

T(s) 

152.8 

264.6 

337.0 

441.0 

564.2 

666.6 


*(x a k 9 ) 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 


Iter(fc) 

242 

354 

490 

547 

596 

767 

UFAPL 

T(s) 

20.5 

29.7 

41.1 

45.8 

49.9 

63.7 


®K 9 ) 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 

1.60e+02 
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Table 2: Average required number of iterations ( Iter[k )), runtime (T(s)), and the best objective 
value obtained by different algorithms till finding a solution x* satisfying a desires accuracy for 
the projected gradient over 10 instances of the nonconvex regularized least square problem with 
m = 4000, n = 8000. 


Wdx (®*)l| 2 

< 10° 

< 10' 1 

< 10' 2 

< 10' 3 

< 10~ 4 

< 10' 5 

PG 

Iter(fc) 

T(s) 

1364 

169.5 

3.79e+02 

2742 

341.1 

3.79e+02 

> 3471 

> 430.6 
3.79e+02 

3.79e+02 

3.79e+02 

3.79e+02 

UAG 

Iter(fc) 

T(s) 

*K 9 ) 

208 

50.9 

3.79e+02 

299 

73.6 

3.79e+02 

487 

119.8 

3.79e+02 

757 

186.8 

3.79e+02 

1184 

292.5 

3.79e+02 

1689 

416.6 

3.79e+02 

UPFAG 

Iter(fc) 

T(s) 

204 

50.0 

3.79e+02 

297 

73.0 

3.79e+02 

483 

119.3 

3.79e+02 

767 

188.7 

3.79e+02 

1179 

289.9 

3.79e+02 

1693 

416.4 

3.79e+02 

UPFAG-full 

Iter(fc) 

T(s) 

204 

50.7 

3.79e+02 

297 

73.6 

3.79e+02 

483 

119.7 

3.79e+02 

767 

190.2 

3.79e+02 

1179 

291.3 

3.79e+02 

1693 

418.1 

3.79e+02 

UAPL 

Iter(fc) 

T(s) 

146 

322.1 

3.79e+02 

203 

448.8 

3.79e+02 

339 

754.2 

3.79e+02 

434 

967.2 

3.79e+02 

598 

1332.5 

3.79e+02 

721 

1611.4 

3.79e+02 

UFAPL 

Iter(fc) 

T(s) 

n*i a ) 

124 

38.8 

3.79e+02 

198 

62.1 

3.79e+02 

309 

97.0 

3.79e+02 

362 

113.7 

3.79e+02 

561 

175.8 

3.79e+02 

677 

212.1 

3.79e+02 


6 Concluding Remarks 

In this paper, we extend the framework of uniformly optimal algorithms, currently designed for con¬ 
vex programming, to nonconvex nonlinear programming. In particular, by incorporating a gradient 
descent step into the framework of uniformly optimal convex programming methods, namely, accel¬ 
erated gradient and bundle-level type methods, and enforcing the function values evaluated at each 
iteration of these methods non-increasing, we present unified algorithms for minimizing composite 
objective functions given by summation of a function / with Holder continuous gradient and simple 
convex term over a convex set. We show that these algorithms exhibit the best known convergence 
rate when / is nonconvex and possess the optimal convergence rate if / turns out to be convex. 
Therefore, these algorithms allow us to have a unified treatment for nonlinear programming prob¬ 
lems regardless of their smoothness level and convexity property. Furthermore, we show that the 
gradient descent step can be replaced by some Quasi-Newton steps to possibly improve the practical 
performance of these algorithms. Some numerical experiments are also presented to show the perfor¬ 
mance of our developed algorithms for solving a couple of nonlinear programming problems in data 
analysis. 
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Table 3: Average required number of iterations (Iter(k)), runtime (T(.s)), objective value, and clas¬ 
sification error found till reaching a desired accuracy for ||g x (x*)|| 2 over 10 instances of the sigmoid 
SVM problem. 


iKonii 2 

< 10° 

< 10' 1 

< 10' 2 

< 10” 3 

< 10~ 4 

< 10” 5 

| m = 1000, n = 2000 [ 


Iter(fc) 

1 

6 

805 

1646 

2223 

2939 

PG 

T(s) 

0.0 

0.0 

3.1 

6.2 

8.4 
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*K 9 ) 
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7.71e-l 
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1.76e-l 

1.70e-l 


er\x a k 3 ) 
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46.22 

40.50 
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31.17 

30.96 


Iter(fc) 

1 

5 

94 

195 

311 

559 

UAG 

T(s) 

0.0 

0.1 

0.7 

1.5 

2.4 

4.2 

n< a ) 

4.27e+0 

4.19e+0 

4.17e-l 

2.09e-01 

1.98e-01 

1.67e-01 


er( x t 9 ) 

47.53 

45.91 

32.78 

31.41 

30.99 

30.86 


Iter(fc) 

1 

4 

92 

192 

314 

583 

UPFAG 

T(s) 

0.0 

0.0 

0.7 

1.5 

2.5 

5.2 

*(xD 

4.27e+0 

4.19e+0 

4.19e-01 

2.10e-01 
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1.97e-01 


erK 9 ) 

47.53 

45.94 

32.75 
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30.95 


Iter(fc) 

1 

4 

92 

192 

314 

583 

UPFAG-full 

T(a) 
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0.0 
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45.94 

32.75 

31.42 
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30.95 
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13.6 
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®K 3 ) 

1.04e+01 
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er\x a k g ) 
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42.70 

39.28 

31.51 

31.22 

31.23 


Iter(fc) 

1 

3 

27 

42 

64 

100 

UFAPL 

T{s) 

0.0 

0.1 

0.5 

0.7 

1.0 

1.6 

*K 9 ) 

4.27e+0 

3.99e+0 

4.01e-01 

2.17e-01 

2.00e-01 

1.98e-01 


er(x“ 9 ) 

47.53 

45.04 

35.04 

31.59 

31.09 

30.97 

| m = 2000, n = 4000 j 


Iter (k) 

1 

47 

1648 

3389 

4853 

6324 

PG 

T(s) 

0.0 

0.9 

31.2 

64.2 

91.9 

119.5 


5.57e+0 

4.93e+0 

8.41e-01 

2.48e-01 

2.07e-01 

2.01e-01 


er\ x l 9 ) 

53.61 

36.17 

31.22 

26.39 

25.29 

25.28 


Iter (k) 

1 

15 

121 

232 

356 

584 

UAG 

T(s) 

0.0 

0.6 

4.6 

8.9 

13.6 

22.2 

n< 3 ) 

5.57e+0 

4.93e+0 

6.62e-01 

3.15e-01 

2.91e-01 

2.90e-01 


er\ x l 9 ) 

53.61 

35.51 

27.46 

26.15 

26.02 

25.89 


Iter (k) 

1 

14 

118 

237 

374 

648 

UPFAG 

T(s) 

0.0 

0.5 

4.5 

9.2 

14.4 

27.4 

W) 

5.57e+0 

4.94e+0 

6.69e-01 

3.17e-01 

2.92e-01 

2.90e-01 


er ( x k 9 ) 

53.61 

35.59 

27.45 

26.14 

26.00 

25.91 


Iter (k) 

1 

14 

118 

235 

360 

635 

UPFAG-full 

T(s ) 

*K 3 ) 

0.0 

5.57e+0 

0.5 

4.94e+0 

4.5 

6.69e-01 

9.5 

3.15e-01 

14.5 

2.91e-01 

52.0 

2.90e-01 


er ( x k 9 ) 

53.61 

35.59 

27.45 

26.14 

26.00 

25.91 


Iter (k) 

1 

5 

11 

26 

41 

98 

UAPL 

T(s ) 

0.6 

2.8 

6.5 

15.8 

25.2 

62.3 

n< 3 ) 

9.60e+0 

2.88e+0 

5.98e-01 

3.63e-01 

3.44e-01 

3.43e-01 


er\ x l 9 ) 

32.44 

32.41 

28.54 

26.48 

26.23 

26.35 


Iter (k) 

1 

4 

27 

45 

66 

92 
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0.1 

0.3 

1.5 

2.5 
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2.34e-01 


er( x l 9 ) 

53.61 
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26.87 
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